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As  it  was  pointed  out  Jn  the  Presidential  Address  *t  this  session, 
in  the  last  6-7  years  the  statistical  community  has  witnessed  a  number 
of  successful  attempts  to  organize  and  classify  the  vast  mount  of 
statistical  literature  which  has  increased  st  an  ever  accelerating  rate 
in  the  last  twenty  years . 

The  pioneering  works  of  Dr.  Frank  Haight,  Professor  Maurice  Kendall, 

Dr.  Million  Buck land,  Professors  PatJl,  Lancaster,  Hold,  Walsh,  Oik  in, 

Sange  and  several  others  should  be  particularly  noted  in  the  connection. 

If  X  have  missed  certain  noteworthy  names  from  this  list  and  X 
probably  have,  it  is  of  course  not  intentially  but  due  to  - ,'orance  or 
absent-mindedness . 

It  stems,  however,  to  the  authors  of  the  present  paper  that  we  should 
move  one  step  further  and  that  a  noaqwterisad  information  retrieval  system 
for  statistical  techniques  and  methodology  is  both  feasible  and  dssireable 
at  the  present  stage  of  development. 

In  addition  to  the  printed  version  of  our  paper  appearing  on  pages 
303-306  of  the  Contributed  Papers  Volume  1  which  I  will  assume  that  the 
audience  Is  familiar  with,  X  would  like  to  report  on  an  experiment  utilising 
computer  techniques  of  what  hopefully  will  become  at  least  the  first  stage 
of  on  Information  Retrieval  Svrten  for  statistical  distributions  and  their 
application.  First,  I  would  llkt  to  give  the  background  for  this  work. 

Since  1963  Dr.  N.  L.  Johnson  and  I  have  been  engaged  in  compiling 
A  Compendium  of  Statietical  Distribution*,  a  three  volume  project,  two  of 
which  are  at  the  present  tine  in  the  final  stages  of  proofreading  and  the 
first  volume  is  due  to  appear  this  mouth. 


During  th«  course  of  preparation  of  the  Compendium,  we  have  collected 
over  2000  reprints  and  xerox  copies  or  various  papers  from  over  200  publi¬ 
cations,  some  from  obscure  publications  dealing  with  the  subject.  Most  of 
these  reprints  are  accompanied  by  abstracts  taken  from  Mathematical  Reviews 
and/or  Referativnyi  Zhurnal,  Zentxmlblatt  fur  Hatematik,  Statistical  Abstracts 
and  others.  On  the  basis  of  preliminary  and  partial,  investigations,  it  Is 
estimated  that  the  major  papers  on  Statistical  Distributions  are  scattered 
in  over  335  journals.  1  have  with  me  a  list  of  these  journals.  It  should 
be  noted,  however,  that  as  it  is  seen  from  Table  B  on  pegs  305,  the  12  basic 
journals  contain  over  60%  of  paper  and  the  remaining  240  lees  than  40%. 

Zt  became  evident  in  the  ocurse  of  our  research  that  this  type  of 
endeavor  requires  permanent  up-dating  and  revision  in  order  to  justify  the 
great  effort  involved  and  to  assure  the  usefulness  of  this  work  for  numerous 
users.  He  were  therefore  contemplating  the  establishment  of  a  permanent 
center  to  increese  the  operational  value  of  the  collection,  Aa  the  flret 
stage  towards  this  aim  wa  decided  to  ooda  the  information  from  each  of  the 
available  papers  according  to  a  classification  to  be  described  in  s  moment. 

This  first  stage  took  about  aix  months,  and  was  performed  by  qualified 
graduate  students  with  our  assistance. 

In  connection  with  the  process  of  coding  we  have  the  following  general 
oomments.  To  determine  the  content  of  an  article  it  was  necessary  to  reed 
■any  of  then  ocmpletely.  This  was  especially  true  when  beginning  a  folder  of 
articles  on  a  distribution  not  yet  ooded.  In  retrospect,  the  flret  few 
distribution  took  a  very  long  time  to  code.  When  norm  then  five  distributions 
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wsr*  completed  (and  more  than  500  articles  coded)  the  teak  of  eodlng  Mae 
•ore  easily  accomplished.  The  average  rate  la  estioated  at  15  min./per 
article.  It  ia  readily  admitted  that  ecatenne  very  experienced  in  the  field 
could  have  done  the  coding  more  quickly.  However,  this  would  be  done  at  the 
expense  of  any  interest  in  the  mathematical  content  of  the  papers  and  would 
be  educationally  unrewarding.  The  question  also  arises  as  to  the  ad  visibility 
(even  the  possibility)  of  coding  at  a  more  rapid  pace  for  long  periods  of 
time  a  or  8  hours  e  day).  It  is  believed  that  this  attitude  of  codii* 
at  a  tno.?e  relaxed  pace  ia  in  with  the  motivation  for  thia  computerised 

file.  Niwely ,  someone  makes  an  a-curate  list  of  the  content  of  e  large  number 
of  ertiolee  so  fust  many  can  have  accuse  to  this  information  without  an 
enormous  investment  of  time  on  the  part  of  many. 

The  information  taken  from  2000  papers  is  now  being  processed  in  coded 
fan  on  IBM  cards,  and  we  are  ready  in  principle  to  prooeed  with  the  operational 
activities,  end  to  supply  interested  institutions  and  individuals  with  infer* 
motion  on  an£  distribution  and/or  specific  characteristic  of  the  distribution 
sueh  ae  mathematical  properties,  estimation  procedure,  eta. ,  details  of  which 
will  now  b#  given. 

Before  discussing  the  details,  however,  X  would  like  to  point  out  that 
ae  the  collection  grows,  the  manual  classification  of  cards  to  supply  the 
requested  Information  is  planned  to  be  replaced  by  a  computer  pragma  to  be 
written  by  Dr.  G*  Koch  of  the  Biostatistlos  Department  of  U.M.C. 


Tb«  computerised  filing  scheme  will  be  constructed  according  to 
principles  studied  in  the  dissertation  of  G.  Koch  entitled  The  Design  of 
Combinatorial  Information  Retrieval  System m_  for  files  with  Holt Iplc -valued 
Attributes ,  Uni  variety  of  North  Carolina,  RLnseo  Sorias  No.  552.  The  chief 
advantage  of  such  a  system  is  that  the  retrieval  tine  for  various  information 
requests  will  be  almost  independent  up  to  a  certain  upper  bound  of  tbe  else 
of  the  file  (i,e. ,  tbe  nusber  of  references  to  be  included  in  the  bibliography) 
which  makes  the  updating  rather  a  painless  and  routine  task.  This  aspect  of 
the  research  is  considered  to  be  both  of  an  applied  and  basic  nature.  The 
basic  aspect  is  related  to  the  choice  of  the  algebraic  schema  from  which  the 
systan  is  to  be  derived  and  then  to  the  discovery  of  the  noat  efficient  way 
of  implementing  it  in  the  computer.  The  applied  aspect  is  that  the  resulting 
computerised  system  will  be  applied  to  a  large  and  complex  bibliogrsphy- 
namtiy  that  of  statistical  distributions. 

However,  even  after  the  first  stage  we  are  already  in  possession  of  a 
rather  unique  and  efficient  classification  procedure. 


5- 


Z  will  now  giv*  the  details  of  our  classification  system: 


The  80  columns  o?  an  IBK  card  are  subdivided  in  tha  fcliowixg  manner: 


Columns  1-3  Journal  identification  number 

Columns  7-10  First  page  of  paper 

-  -  Columns  11-30  assigned  to  distributions  <e«a  next  psge) 

sue  coded  as  follows: 

0  if  distribution  is  not  discussed 

1  if  distribution  is  mentioned 

2  if  distribution  ie  primary  subject 

Columns  61-78  assigned  to  topics  (see  next  page)  are  oodad 

as  follows: 

0  if  topic  is  not  discussed 

1  if  topic  is  mentioned 

2  if  topic  ie  primary  subject 

Colusa  79  assigned  to  number  of  pages  is  coded  as  follows  *. 

0  if  1-4  pages 

1  if  5-8  pages 

2  if  9-12  pages 

3  if  13-16  pages 

4  if  17-20  pages 

5  if  21-24  pages 

6  if  25-35  pages 

7  if  36-50  pages 

8  if  more  than  50  pages 

9  if  unknown 

Column  8C  assigned  to  the  language  of  the  paper  ie  coded 

as  follows: 

1  It  English 

2  if  Russian 

3  if  French 

4  if  German 

5  if  Spanish 

6  If  Italian 
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The  list  of  distribution  families  corresponding  to  columns  11-60: 


11. 

12. 

13. 

14. 
18. 
16. 
17. 
13. 

19. 

20. 
21. 
22. 

23. 

24. 

25. 

26. 

27. 

28. 

29. 

30. 

31. 

32. 

33. 

34. 

35. 

36. 

37. 

38. 

39. 

40. 

41. 

42. 

43. 

44. 

45. 

46. 

47. 

48. 

49. 

50. 


53. 

54. 

55. 


Compendia  and  Bibliographical  Sources 

General  Systems  of  Discrete  Distributions 

Binomial 

Poisson 

Geometric 

Negative  Binomial  -  (compound  Poisson  -  Pascal) 

Hypepgecmetri  e 
Logarithmic  Buries 

Compound  and  Generalised  Diacieia  Distributions 
Contagious  Distributions 
Miscellaneous  Discrete 
Multivariate  Discrete  Distributions 
General  Systems  of  Continuous  Distributions 
Normal  (Gaussian) 

Lognoroal 
Inverse  Gaussian 
Cauchy 
X2 

Gamma 

Exponential  and  Exponential  type 

Pareto 

Weibull 

Extreme  Value  -  Gsumbel  -  Frechet's  distributions 
Logistic 

Laplace  -  (double  exponential) 

Berta 

Ractangular  (uniform)  and  related  distributions 
P  (and  &) 
t 

Non central 

Quadratic  Forms  in  Normal  Variables 
Moncentral  F 
Noncentral  t 

Generalized  t  and  F  (under  non-standard  normal  assumptions) 
Distributions  of  Correlation  Coefficients 
Miscellaneous  Continuous  Distributions 
General  Multivariate  Distributions  ana  Surfaces  (Bivariate) 
General  Multivariate  Distributions  and  Surfaces  (Multivariate) 
Multivariate  normal  (Bivariate) 

Multivariate  normal  (Tri variate) 

Multivariate  normal  (Multivariate) 

Multivariate  t- 
Multi variate  extreme-value 
Multivariate  exponential  and  Meibull 
Multivariate  Gmna 
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56.  Wiahart 

57.  k'ou-cantral  Wiahart  and  distribution  of  latent  root*  and  vector* 

58.  Multivariate  Bata  and  T 

59.  Non-central  Multivariate  Beta 

60.  Miscellaneous  Multivariate  Distributions 


The  list  of  topics  corresponding  to  colucms  61-78: 


61.  Origin  and  historical  remarks 

62.  Definition,  Distribution  function.  Characterizations 

63.  Moments,  cubbI****  end  other  characteristics  (excluding  order  statistics) 

64.  Genesis  in  models 

65.  Tables 

66.  Nomographs  and  Probability  papers 

67.  Approximations  to  the  distribution 

68.  limiting  for  jib 

69.  Transformation  and  relatione  to  other  distributions 

70.  Order  statistics 

71.  Mathematical  properties 

72.  Point  estimation 

73.  Sequential  estimation 

74.  Interval  estimation 

75.  Test  on  parameters 

76.  Goodness  of  fit 

78.  Applications  in  statistical  methodology 

79.  Application  in  sciences 


This  system  is  geared  to  reply  to  queries  of  the  type;  '‘list  all 
the  papers  dealing  with  estimation  methods  of  the  shape  parameter  of  the 
Meibull  distribution."  Ve  would  be  able  to  supply  up-to-date  information 
on  aueh^  request  without  much  redundancy.  One  of  tha  major  difficulties 
in  our  classification  system,  however,  is  that  for  its  moat  efficient 
functioning  it  is  desircable  that  the  topics  and  distributions  be  mutually 
exclusive,  otherwise  we  may  over-supply  with  extraneous  information. 

A  Minor  problem  which  has  not  been  eatisfaetorally  solved  yet  (besides 
the  problem  of  possible  missing  in format lea  in  the  coded  articles)  is  how 
to  oode  the  ten  digit  identification  number  for  non -journal  articles  from 
various  research  centers  and  selections  of  books. 


I  would  like  to  save  the  remaining  tine  allocated  by  the  Chaivoan 
for  questions  and  especially  fop  a  discussion  period.  In  particular  I 
m  very  interested  in  your  vi@W3  about  the  praetic-tlity  and  usefulness 
of  the  proposed  system  in  your'  statistical  activities* 

From  private  conversation!!  with  a  number  of  distinguished  delegates _ 

active  in  bibliographical  statistical  research  as  well  as  people  who 
sponsor  this  work,  I  have  discovered  substantial  interest  in  it,  and  while 
some  of  them  expressed  doubts  whether  the  time  is  ripe  for  this  rather 
sophisticated  procedure  of  storing  end  utilizing  information  on  statistical 
distributions  and  suggested  that  in  their  opinion  the  conventional  method 
of  books  devoted  to  particular  distributions  or  to  particular  topic#  of 
distribution  theory  is  juat  aa  efficient  and  usable  for  the  time  being. 

Many  others,  however,  were  in  full  agreement  that  the  delay  from  year  to 
year  increases  the  danger  that  in  the  not  too  distant  future,  when  th* 
information  explosion  reaches  a  certain  saturation  point,  the  inevitable 
task  of  the  initialorganisation  and  subsequent  continuity  of  operational 
efficiency  and  smoothness  of  such  a  computerised  system  will  be  signifi¬ 
cantly  more  difficult  and  complex. 
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